Overview

Dataset statistics

Number of variables12
Number of observations150000
Missing cells33655
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory96.0 B

Variable types

NUM11
BOOL1

Reproduction

Analysis started2020-08-11 02:00:25.415873
Analysis finished2020-08-11 02:01:17.018433
Duration51.6 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

NumberOfTimes90DaysLate is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fieldsHigh correlation
NumberOfTime30-59DaysPastDueNotWorse is highly correlated with NumberOfTimes90DaysLate and 1 other fieldsHigh correlation
NumberOfTime60-89DaysPastDueNotWorse is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fieldsHigh correlation
MonthlyIncome has 29731 (19.8%) missing values Missing
NumberOfDependents has 3924 (2.6%) missing values Missing
RevolvingUtilizationOfUnsecuredLines is highly skewed (γ1 = 97.63157449) Skewed
NumberOfTime30-59DaysPastDueNotWorse is highly skewed (γ1 = 22.59710756) Skewed
DebtRatio is highly skewed (γ1 = 95.15779287) Skewed
MonthlyIncome is highly skewed (γ1 = 114.0403179) Skewed
NumberOfTimes90DaysLate is highly skewed (γ1 = 23.08734547) Skewed
NumberOfTime60-89DaysPastDueNotWorse is highly skewed (γ1 = 23.33174312) Skewed
Unnamed: 0 has unique values Unique
RevolvingUtilizationOfUnsecuredLines has 10878 (7.3%) zeros Zeros
NumberOfTime30-59DaysPastDueNotWorse has 126018 (84.0%) zeros Zeros
DebtRatio has 4113 (2.7%) zeros Zeros
MonthlyIncome has 1634 (1.1%) zeros Zeros
NumberOfOpenCreditLinesAndLoans has 1888 (1.3%) zeros Zeros
NumberOfTimes90DaysLate has 141662 (94.4%) zeros Zeros
NumberRealEstateLoansOrLines has 56188 (37.5%) zeros Zeros
NumberOfTime60-89DaysPastDueNotWorse has 142396 (94.9%) zeros Zeros
NumberOfDependents has 86902 (57.9%) zeros Zeros

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct count150000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75000.5
Minimum1
Maximum150000
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB
2020-08-11T10:01:17.298179image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7500.95
Q137500.75
median75000.5
Q3112500.25
95-th percentile142500.05
Maximum150000
Range149999
Interquartile range (IQR)74999.5

Descriptive statistics

Standard deviation43301.41453
Coefficient of variation (CV)0.5773483447
Kurtosis-1.2
Mean75000.5
Median Absolute Deviation (MAD)37500
Skewness0
Sum1.1250075e+10
Variance1875012500
2020-08-11T10:01:17.449322image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
1078061< 0.1%
 
95181< 0.1%
 
156611< 0.1%
 
136121< 0.1%
 
33711< 0.1%
 
13221< 0.1%
 
74651< 0.1%
 
54161< 0.1%
 
279431< 0.1%
 
Other values (149990)149990> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
1500001< 0.1%
 
1499991< 0.1%
 
1499981< 0.1%
 
1499971< 0.1%
 
1499961< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
139974
1
 
10026
ValueCountFrequency (%) 
013997493.3%
 
1100266.7%
 

RevolvingUtilizationOfUnsecuredLines
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count125728
Unique (%)83.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.048438054666888
Minimum0.0
Maximum50708.0
Zeros10878
Zeros (%)7.3%
Memory size1.1 MiB
2020-08-11T10:01:17.675778image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.029867442
median0.154180737
Q30.5590462475
95-th percentile0.9999999
Maximum50708
Range50708
Interquartile range (IQR)0.5291788055

Descriptive statistics

Standard deviation249.7553706
Coefficient of variation (CV)41.29254005
Kurtosis14544.71341
Mean6.048438055
Median Absolute Deviation (MAD)0.148325347
Skewness97.63157449
Sum907265.7082
Variance62377.74516
2020-08-11T10:01:17.811803image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0108787.3%
 
0.9999999102566.8%
 
117< 0.1%
 
0.95009988< 0.1%
 
0.713147416< 0.1%
 
0.0079840326< 0.1%
 
0.9540918166< 0.1%
 
0.7964071865< 0.1%
 
0.8502994015< 0.1%
 
0.5389221565< 0.1%
 
Other values (125718)12880885.9%
 
ValueCountFrequency (%) 
0108787.3%
 
8.37e-061< 0.1%
 
9.93e-061< 0.1%
 
1.25e-051< 0.1%
 
1.43e-051< 0.1%
 
ValueCountFrequency (%) 
507081< 0.1%
 
291101< 0.1%
 
221981< 0.1%
 
220001< 0.1%
 
205141< 0.1%
 

age
Real number (ℝ≥0)

Distinct count86
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.295206666666665
Minimum0
Maximum109
Zeros1
Zeros (%)< 0.1%
Memory size1.1 MiB
2020-08-11T10:01:17.957850image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile29
Q141
median52
Q363
95-th percentile78
Maximum109
Range109
Interquartile range (IQR)22

Descriptive statistics

Standard deviation14.77186586
Coefficient of variation (CV)0.2824707426
Kurtosis-0.4946688326
Mean52.29520667
Median Absolute Deviation (MAD)11
Skewness0.1889945451
Sum7844281
Variance218.2080211
2020-08-11T10:01:18.091570image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4938372.6%
 
4838062.5%
 
5037532.5%
 
6337192.5%
 
4737192.5%
 
4637142.5%
 
5336482.4%
 
5136272.4%
 
5236092.4%
 
5635892.4%
 
Other values (76)11297975.3%
 
ValueCountFrequency (%) 
01< 0.1%
 
211830.1%
 
224340.3%
 
236410.4%
 
248160.5%
 
ValueCountFrequency (%) 
1092< 0.1%
 
1071< 0.1%
 
1051< 0.1%
 
1033< 0.1%
 
1023< 0.1%
 

NumberOfTime30-59DaysPastDueNotWorse
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4210333333333333
Minimum0
Maximum98
Zeros126018
Zeros (%)84.0%
Memory size1.1 MiB
2020-08-11T10:01:18.281455image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.192781272
Coefficient of variation (CV)9.958311944
Kurtosis522.3765449
Mean0.4210333333
Median Absolute Deviation (MAD)0
Skewness22.59710756
Sum63155
Variance17.57941479
2020-08-11T10:01:18.408583image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
012601884.0%
 
11603310.7%
 
245983.1%
 
317541.2%
 
47470.5%
 
53420.2%
 
982640.2%
 
61400.1%
 
754< 0.1%
 
825< 0.1%
 
Other values (6)25< 0.1%
 
ValueCountFrequency (%) 
012601884.0%
 
11603310.7%
 
245983.1%
 
317541.2%
 
47470.5%
 
ValueCountFrequency (%) 
982640.2%
 
965< 0.1%
 
131< 0.1%
 
122< 0.1%
 
111< 0.1%
 

DebtRatio
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count114194
Unique (%)76.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean353.00507576386985
Minimum0.0
Maximum329664.0
Zeros4113
Zeros (%)2.7%
Memory size1.1 MiB
2020-08-11T10:01:18.599559image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.004329004
Q10.1750738323
median0.366507841
Q30.8682537732
95-th percentile2449
Maximum329664
Range329664
Interquartile range (IQR)0.693179941

Descriptive statistics

Standard deviation2037.818523
Coefficient of variation (CV)5.772774
Kurtosis13734.28886
Mean353.0050758
Median Absolute Deviation (MAD)0.2457227975
Skewness95.15779287
Sum52950761.36
Variance4152704.333
2020-08-11T10:01:18.742956image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
041132.7%
 
12290.2%
 
41740.1%
 
21700.1%
 
31620.1%
 
51430.1%
 
91250.1%
 
101170.1%
 
71150.1%
 
131140.1%
 
Other values (114184)14453896.4%
 
ValueCountFrequency (%) 
041132.7%
 
2.6e-051< 0.1%
 
3.69e-051< 0.1%
 
3.93e-051< 0.1%
 
6.62e-051< 0.1%
 
ValueCountFrequency (%) 
3296641< 0.1%
 
3264421< 0.1%
 
3070011< 0.1%
 
2205161< 0.1%
 
1688351< 0.1%
 

MonthlyIncome
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct count13594
Unique (%)11.3%
Missing29731
Missing (%)19.8%
Infinite0
Infinite (%)0.0%
Mean6670.221237392844
Minimum0.0
Maximum3008750.0
Zeros1634
Zeros (%)1.1%
Memory size1.1 MiB
2020-08-11T10:01:18.942010image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1300
Q13400
median5400
Q38249
95-th percentile14587.6
Maximum3008750
Range3008750
Interquartile range (IQR)4849

Descriptive statistics

Standard deviation14384.67422
Coefficient of variation (CV)2.15655129
Kurtosis19504.7054
Mean6670.221237
Median Absolute Deviation (MAD)2317
Skewness114.0403179
Sum802220838
Variance206918852.3
2020-08-11T10:01:19.098279image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
500027571.8%
 
400021061.4%
 
600019341.3%
 
300017581.2%
 
016341.1%
 
250015511.0%
 
1000014661.0%
 
350013600.9%
 
450012260.8%
 
700012230.8%
 
Other values (13584)10325468.8%
 
(Missing)2973119.8%
 
ValueCountFrequency (%) 
016341.1%
 
16050.4%
 
26< 0.1%
 
42< 0.1%
 
52< 0.1%
 
ValueCountFrequency (%) 
30087501< 0.1%
 
17940601< 0.1%
 
15601001< 0.1%
 
10725001< 0.1%
 
8350401< 0.1%
 

NumberOfOpenCreditLinesAndLoans
Real number (ℝ≥0)

ZEROS

Distinct count58
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.45276
Minimum0
Maximum58
Zeros1888
Zeros (%)1.3%
Memory size1.1 MiB
2020-08-11T10:01:19.252770image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median8
Q311
95-th percentile18
Maximum58
Range58
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.14595099
Coefficient of variation (CV)0.6087894356
Kurtosis3.091066746
Mean8.45276
Median Absolute Deviation (MAD)3
Skewness1.21531378
Sum1267914
Variance26.48081159
2020-08-11T10:01:19.391243image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6136149.1%
 
7132458.8%
 
5129318.6%
 
8125628.4%
 
4116097.7%
 
9113557.6%
 
1096246.4%
 
390586.0%
 
1183215.5%
 
1270054.7%
 
Other values (48)4067627.1%
 
ValueCountFrequency (%) 
018881.3%
 
144383.0%
 
266664.4%
 
390586.0%
 
4116097.7%
 
ValueCountFrequency (%) 
581< 0.1%
 
572< 0.1%
 
562< 0.1%
 
544< 0.1%
 
531< 0.1%
 

NumberOfTimes90DaysLate
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.26597333333333334
Minimum0
Maximum98
Zeros141662
Zeros (%)94.4%
Memory size1.1 MiB
2020-08-11T10:01:19.538202image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.169303788
Coefficient of variation (CV)15.67564588
Kurtosis537.7389446
Mean0.2659733333
Median Absolute Deviation (MAD)0
Skewness23.08734547
Sum39896
Variance17.38309407
2020-08-11T10:01:19.664376image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
014166294.4%
 
152433.5%
 
215551.0%
 
36670.4%
 
42910.2%
 
982640.2%
 
51310.1%
 
6800.1%
 
738< 0.1%
 
821< 0.1%
 
Other values (9)48< 0.1%
 
ValueCountFrequency (%) 
014166294.4%
 
152433.5%
 
215551.0%
 
36670.4%
 
42910.2%
 
ValueCountFrequency (%) 
982640.2%
 
965< 0.1%
 
171< 0.1%
 
152< 0.1%
 
142< 0.1%
 

NumberRealEstateLoansOrLines
Real number (ℝ≥0)

ZEROS

Distinct count28
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.01824
Minimum0
Maximum54
Zeros56188
Zeros (%)37.5%
Memory size1.1 MiB
2020-08-11T10:01:19.812747image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum54
Range54
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.129770985
Coefficient of variation (CV)1.109533101
Kurtosis60.47680765
Mean1.01824
Median Absolute Deviation (MAD)1
Skewness3.482483994
Sum152736
Variance1.276382478
2020-08-11T10:01:19.942380image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
05618837.5%
 
15233834.9%
 
23152221.0%
 
363004.2%
 
421701.4%
 
56890.5%
 
63200.2%
 
71710.1%
 
8930.1%
 
9780.1%
 
Other values (18)1310.1%
 
ValueCountFrequency (%) 
05618837.5%
 
15233834.9%
 
23152221.0%
 
363004.2%
 
421701.4%
 
ValueCountFrequency (%) 
541< 0.1%
 
321< 0.1%
 
291< 0.1%
 
261< 0.1%
 
253< 0.1%
 

NumberOfTime60-89DaysPastDueNotWorse
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count13
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.24038666666666667
Minimum0
Maximum98
Zeros142396
Zeros (%)94.9%
Memory size1.1 MiB
2020-08-11T10:01:20.088643image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.155179421
Coefficient of variation (CV)17.28539889
Kurtosis545.6827435
Mean0.2403866667
Median Absolute Deviation (MAD)0
Skewness23.33174312
Sum36058
Variance17.26551602
2020-08-11T10:01:20.223111image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
014239694.9%
 
157313.8%
 
211180.7%
 
33180.2%
 
982640.2%
 
41050.1%
 
534< 0.1%
 
616< 0.1%
 
79< 0.1%
 
965< 0.1%
 
Other values (3)4< 0.1%
 
ValueCountFrequency (%) 
014239694.9%
 
157313.8%
 
211180.7%
 
33180.2%
 
41050.1%
 
ValueCountFrequency (%) 
982640.2%
 
965< 0.1%
 
111< 0.1%
 
91< 0.1%
 
82< 0.1%
 

NumberOfDependents
Real number (ℝ≥0)

MISSING
ZEROS

Distinct count13
Unique (%)< 0.1%
Missing3924
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean0.7572222678605657
Minimum0.0
Maximum20.0
Zeros86902
Zeros (%)57.9%
Memory size1.1 MiB
2020-08-11T10:01:20.366751image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum20
Range20
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.115086071
Coefficient of variation (CV)1.472600739
Kurtosis3.001656811
Mean0.7572222679
Median Absolute Deviation (MAD)0
Skewness1.588242379
Sum110612
Variance1.243416947
2020-08-11T10:01:20.499694image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
08690257.9%
 
12631617.5%
 
21952213.0%
 
394836.3%
 
428621.9%
 
57460.5%
 
61580.1%
 
751< 0.1%
 
824< 0.1%
 
95< 0.1%
 
Other values (3)7< 0.1%
 
(Missing)39242.6%
 
ValueCountFrequency (%) 
08690257.9%
 
12631617.5%
 
21952213.0%
 
394836.3%
 
428621.9%
 
ValueCountFrequency (%) 
201< 0.1%
 
131< 0.1%
 
105< 0.1%
 
95< 0.1%
 
824< 0.1%
 

Interactions

2020-08-11T10:00:42.711898image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:43.067009image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:43.383218image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:43.670178image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:43.960917image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:44.264841image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:44.553327image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:44.839007image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:45.136407image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:45.440424image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:45.755798image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:46.062678image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:46.364040image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:46.644505image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:46.894675image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:47.196504image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:47.484751image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:47.804271image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:48.335681image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:48.639519image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:48.925349image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:49.220517image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:49.563969image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:49.864015image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:50.113905image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:50.345791image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:50.590507image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:50.833309image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:51.081023image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:51.336061image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:51.587430image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:51.835033image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:52.074553image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:52.350376image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:52.623991image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:52.928620image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:53.212362image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:53.481404image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:53.752953image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:54.038641image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:54.271370image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:54.497103image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:54.719995image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:54.946471image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:55.195548image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:55.445214image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:55.808816image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:56.022802image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:56.249939image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:56.468008image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:56.704213image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:56.931495image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:57.154718image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:57.378576image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:57.601214image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:57.845641image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:58.071191image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:58.318203image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:58.537229image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:58.763649image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:59.000293image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:59.237190image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:59.463179image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:59.725331image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:00:59.981210image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:00.232373image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:00.497524image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:00.738091image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:00.967447image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:01.186406image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:01.410514image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:01.632140image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:01.866050image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:02.226753image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:02.453127image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:02.679477image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:02.905630image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:03.146680image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:03.383598image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:04.327539image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:04.698990image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:04.933431image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:05.168255image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:05.433010image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:05.680192image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:05.922190image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:06.145949image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:06.372974image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:06.610391image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:06.849439image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:07.076440image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:07.301510image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:07.527674image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:07.777453image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:08.040385image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:08.273443image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:08.504848image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:08.727839image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:08.955437image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:09.192795image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:09.423612image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:09.793405image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:10.006884image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:10.227421image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:10.441890image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:10.672084image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:10.903795image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:11.129895image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:11.380806image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:11.609229image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:11.872784image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:12.146124image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:12.411507image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:12.636144image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:12.881916image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:13.113602image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:13.362955image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:13.661208image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:14.013297image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:14.303522image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:14.559482image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Correlations

2020-08-11T10:01:20.661701image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-11T10:01:21.169757image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-11T10:01:21.547009image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-11T10:01:21.983409image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-11T10:01:15.032255image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:15.705313image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:16.536514image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/
2020-08-11T10:01:16.715781image/svg+xmlMatplotlib v3.3.0, https://matplotlib.org/

Sample

First rows

Unnamed: 0SeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberOfTimes90DaysLateNumberRealEstateLoansOrLinesNumberOfTime60-89DaysPastDueNotWorseNumberOfDependents
0110.7661274520.8029829120.0130602.0
1200.9571514000.1218762600.040001.0
2300.6581803810.0851133042.021000.0
3400.2338103000.0360503300.050000.0
4500.9072394910.02492663588.070100.0
5600.2131797400.3756073500.030101.0
6700.3056825705710.000000NaN80300.0
7800.7544643900.2099403500.080000.0
8900.11695127046.000000NaN2000NaN
91000.1891695700.60629123684.090402.0

Last rows

Unnamed: 0SeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberOfTimes90DaysLateNumberRealEstateLoansOrLinesNumberOfTime60-89DaysPastDueNotWorseNumberOfDependents
14999014999100.0555184600.6097794335.070102.0
14999114999200.1041125900.47765810316.0100200.0
14999214999300.8719765004132.000000NaN110103.0
14999314999401.0000002200.000000820.010000.0
14999414999500.3857425000.4042933400.070000.0
14999514999600.0406747400.2251312100.040100.0
14999614999700.2997454400.7165625584.040102.0
14999714999800.2460445803870.000000NaN180100.0
14999814999900.0000003000.0000005716.040000.0
14999915000000.8502836400.2499088158.080200.0